17 research outputs found

    Statistical modelling of spatio-temporal dependencies in NGS data

    Get PDF

    Identifying overlapping terrorist cells from the Noordin Top actor-event network

    Get PDF
    Actor-event data are common in sociological settings, whereby one registers the pattern of attendance of a group of social actors to a number of events. We focus on 79 members of the Noordin Top terrorist network, who were monitored attending 45 events. The attendance or non-attendance of the terrorist to events defines the social fabric, such as group coherence and social communities. The aim of the analysis of such data is to learn about the affiliation structure. Actor-event data is often transformed to actor-actor data in order to be further analysed by network models, such as stochastic block models. This transformation and such analyses lead to a natural loss of information, particularly when one is interested in identifying, possibly overlapping, subgroups or communities of actors on the basis of their attendances to events. In this paper we propose an actor-event model for overlapping communities of terrorists, which simplifies interpretation of the network. We propose a mixture model with overlapping clusters for the analysis of the binary actor-event network data, called {\tt manet}, and develop a Bayesian procedure for inference. After a simulation study, we show how this analysis of the terrorist network has clear interpretative advantages over the more traditional approaches of affiliation network analysis.Comment: 24 pages, 5 figures; related R package (manet) available on CRA

    Mixtures of multivariate generalized linear models with overlapping clusters

    Full text link
    With the advent of ubiquitous monitoring and measurement protocols, studies have started to focus more and more on complex, multivariate and heterogeneous datasets. In such studies, multivariate response variables are drawn from a heterogeneous population often in the presence of additional covariate information. In order to deal with this intrinsic heterogeneity, regression analyses have to be clustered for different groups of units. Up until now, mixture model approaches assigned units to distinct and non-overlapping groups. However, not rarely these units exhibit more complex organization and clustering. It is our aim to define a mixture of generalized linear models with overlapping clusters of units. This involves crucially an overlap function, that maps the coefficients of the parent clusters into the the coefficient of the multiple allocation units. We present a computationally efficient MCMC scheme that samples the posterior distribution of the parameters in the model. An example on a two-mode network study shows details of the implementation in the case of a multivariate probit regression setting. A simulation study shows the overall performance of the method, whereas an illustration of the voting behaviour on the US supreme court shows how the 9 justices split in two overlapping sets of justices.Comment: 24 pages, 3 figure

    Semiparametric finite mixture of regression models with Bayesian P-splines

    No full text
    A semiparametric finite mixture of regression models is defined, with concomitant information assumed to influence both the component weights and the conditional means. The contribution of a concomitant variable is flexibly specified as a smooth function represented by cubic splines. A Bayesian estimation procedure is proposed and an empirical analysis of the baseball salaries dataset is illustrated

    Bayesian variable selection in linear regression models with non-normal errors

    No full text
    This paper addresses two crucial issues in multiple linear regression analysis: (i) error terms whose distribution is non-normal because of the presence of asymmetry of the response variable and/or data coming from heterogeneous populations; (ii) selection of the regressors that effectively contribute to explaining patterns in the observations and are relevant for predicting the dependent variable. A solution to the first issue can be obtained through an approach in which the distribution of the error terms is modelled using a finite mixture of Gaussian distributions. In this paper we use this approach to specify a Bayesian linear regression model with non-normal errors; furthermore, by embedding Bayesian variable selection techniques in the specification of the model, we simultaneously perform estimation and variable selection. These tasks are accomplished by sampling from the posterior distributions associated with the model. The performances of the proposed methodology are evaluated through the analysis of simulated datasets in comparison with other approaches. The results of an analysis based on a real dataset are also provided. The methods developed in this paper result to perform well when the distribution of the error terms is characterised by heavy tails, skewness and/or multimodality

    Bayesian variable selection in linear regression models with non-normal errors

    No full text
    Multiple linear regression is a prime statistical tool used to discover potential relationships between an outcome and some explanatory variables of interest. One of the common required assumptions is for the error terms in the model to be Gaussian. Instead of assuming normality, an alternative is to use a finite mixture of normal distributions, allowing for a more flexible definition of the heterogeneity structure of the data. We use this approach to develop a Bayesian linear regression model with non-normal errors, and through variable selection we focus on finding active predictors effectively contributing to explaining patterns in the observations

    Bayesian variable selection in linear regression models with non-normal errors

    No full text
    This paper addresses two crucial issues in multiple linear regression analysis: (i) error terms whose distribution is non-normal because of the presence of asymmetry of the response variable and/or data coming from heterogeneous populations; (ii) selection of the regressors that effectively contribute to explaining patterns in the observations and are relevant for predicting the dependent variable. A solution to the first issue can be obtained through an approach in which the distribution of the error terms is modelled using a finite mixture of Gaussian distributions. In this paper we use this approach to specify a Bayesian linear regression model with non-normal errors; furthermore, by embedding Bayesian variable selection techniques in the specification of the model, we simultaneously perform estimation and variable selection. These tasks are accomplished by sampling from the posterior distributions associated with the model. The performances of the proposed methodology are evaluated through the analysis of simulated datasets in comparison with other approaches. The results of an analysis based on a real dataset are also provided. The methods developed in this paper result to perform well when the distribution of the error terms is characterised by heavy tails, skewness and/or multimodality

    Fused graphical lasso for brain networks with symmetries

    Get PDF
    Neuroimaging is the growing area of neuroscience devoted to produce data with the goal of capturing processes and dynamics of the human brain. We consider the problem of inferring the brain connectivity network from time- dependent functional magnetic resonance imaging (fMRI) scans. To this aim we propose the symmetric graphical lasso, a penalized likelihood method with a fused type penalty function that takes into explicit account the natural symmetrical structure of the brain. Symmetric graphical lasso allows one to learn simultaneously both the network structure and a set of symmetries across the two hemispheres. We implement an alternating directions method of multipliers algorithm to solve the corresponding convex optimization problem. Furthermore, we apply our methods to estimate the brain networks of two subjects, one healthy and one affected by mental disorder, and to compare them with respect to their symmetric structure. The method applies once the temporal dependence characterizing fMRI data have been accounted for and we compare the impact on the analysis of different detrending techniques on the estimated brain networks. Although we focus on brain networks, symmetric graphical lasso is a tool which can be more generally applied to learn multiple networks in a context of dependent samples
    corecore